This is an interactive notebook. You can run it locally or use the links below:
Using HuggingFace Datasets in evaluations with preprocess_model_input
Note: This is a temporary workaround
This guide demonstrates a workaround for using HuggingFace Datasets with Weave evaluations.
We are actively working on developing more seamless integrations that will simplify this process.
While this approach works, expect improvements and updates in the near future that will make working with external datasets more straightforward.
Setup and imports
First, we initialize Weave and connect to Weights & Biases for tracking experiments.Load and prepare HuggingFace dataset
- We load a HuggingFace dataset.
- Create an index mapping to reference the dataset rows.
- This index approach allows us to maintain references to the original dataset.
Note:
In the index, we encode thehf_hub_name
along with thehf_id
to ensure each row has a unique identifier.
This unique digest value is used for tracking and referencing specific dataset entries during evaluations.
Define processing and evaluation functions
Processing pipeline
preprocess_example
: Transforms the index reference into the actual data needed for evaluationhf_eval
: Defines how to score the model outputsfunction_to_evaluate
: The actual function/model being evaluated
Create and run evaluation
- For each index in hf_index:
preprocess_example
gets the corresponding data from the HF dataset.- The preprocessed data is passed to
function_to_evaluate
. - The output is scored using
hf_eval
. - Results are tracked in Weave.